Harp-DAAL: A High Performance Data-Intensive Machine Learning Framework

نویسندگان

  • Langshi Chen
  • Judy Qiu
چکیده

Nowadays, many data analytics and machine learning problems contain millions or billions of training data and parameter data, it is obvious that the Distributed Processing mode is the only choice for many applications. Within DAAL's framework, the communication layer of the Distributed Processing mode is left to the users, which could be Hadoop, Spark, MPI, or any of the user-defined middleware. The goal of our project is thus to integrate Harp (a Hadoop plugin) into the Distributed Processing mode of DAAL. Harp has the following advantages:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of Harp-DAAL Interface

Nowadays, many data analytics and machine learning problems contain millions or billions of training data and parameter data, it is obvious that the Distributed Processing mode is the only choice for many applications. Within DAAL's framework, the communication layer of the Distributed Processing mode is left to the users, which could be Hadoop, Spark, MPI, or any of the user-defined middleware...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...

متن کامل

BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computationand data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017